The analysis of driving fatalities in the UK (1969-1984)

Introduction

Let’s start by visualising the data.

The code below plots the raw data. From this we should be able to assess the general trend and seasonality of our data set.

plot(UKDriverDeaths)

Noticeably, there is an overall decline after the mid 70’s. This could be due to the introduction of road safety measures such as the “seat belt law” which was enforced in 1983. It is also apparent that there are repeating peaks for each year which suggests some seasonal pattern. This could indicate higher deaths in certain months e.g. winter, summer. These weather conditions consequently have an impact on road safety, for example severe rainfall reducing the durability of car tyres. The fluctuations in deaths seem to be a lot more prominent in the early years (1969-1975), however it is apparent that the decline shows that they seem to stabilize later on. Going back to the point of introduction of the seat belt law, there is a noticible decrease in fatalities around 1983 which supports this. Other vital factors include the oil crisis which took place in 1973 and 1979. Looking at the graph, there are noticable dips around these periods.

Box Plot

Now we’re going to use the following code to produce monthly box plot of the data set.

boxplot(UKDriverDeaths ~ cycle(UKDriverDeaths),
main = "Boxplot of UK Driver Deaths by Month",
xlab = "Month", ylab = "Number of Deaths",col ="pink")

Overall Trend: The blox plot reveals a clear seasonal trend in UK driver deaths. It is evident that the number of deaths seem to be lower in the earlier months of the year(January-April) and higher in the later months (October-December)

Median: The medium number of deaths (the horizontal line within each box) generally increases from January to December. The months with the lowest medium deaths appear to be April (Box 4) with a medium of around 1400 and the highest is December (Box 12) with the medium being around 2200.

Spread: IQR: The interquartile range (IQR), represented by the box height, varies across the months. The IGR is narrow in earlier months, suggesting less variability in driver deaths during that period. In contrast, the months like November and december have wider boxes, indicating greater variability.

Whiskers: The whiskers show the typical range of driver deaths for each month, extending to 1.5 time the IQR. Longer whiskers, such as those in October and December, suggest a wider range of potential death counts, possibly due to varying weather conditons or increased holiday travel which was expected after assessing the basic time series plot.

Outliers: It is clear that there are several outliers (points beyond the whiskers). April (Box 4) has a few outliers and October (Box 10), November (Box 11) and December (Box 12) have higher outliers. These represent months with unsually low or high numbers of driver deaths, potentially due to specific events such as unusually good or bad weather, major holidays or policy changes.

Let’s summarise the overall seasonality trend and address potential factors Poorer weather conditions during the months of autumn and winter (October - December) such as rain, snow and ice can increase the risk of accidents. Shorter daylight hours during these months contribute to reduced visibility and higher accident rates. Along with this, increased travel during the holiday season (particularly in December) may further elevate the number of deaths.

Linear Regression Model

To further solidify our understanding of the trend, we will perform a linear regression with the following code:

time_values <- as.numeric(time(UKDriverDeaths))
linear_model <- lm(UKDriverDeaths ~ time_values)

# Plot the linear regression line
plot(time(UKDriverDeaths), UKDriverDeaths, type = "l",
     main = "Linear Regression of UK Driver Deaths",
     xlab = "Year", ylab = "Number of Deaths")
abline(linear_model, col = "pink")

This linear regression plot shows the overall trend in UK Driver Deaths over time.

The downward-sloping regression line suggests a decreasing trend in driver deaths from 1969 to 1984. The data points are scattered but shows a peak for each year, once again, highlighting the seasonality trend that was shown in the box plot.

The decreasing trend may be due to vehicle safety standards, road infrastructure and traffic regulations implemented during this period.

The Road Safety Act 1967 made it an offence to drive a vehicle with a blood alcohol concentration (BAC) in excess of 80mg of alcohol per 100ml of blood.

With this rule being enforced, it is likely that this, combined with the The Seat Belt Law in 1983, resulted in the overall decline of road accidents.

Breusch-Pagan Test

The high fluctuations and variability increase as the year goes on. This causes concern for heteroskadacity so I will perform a Breusch–Pagan test in order to test for this with the following code:

# Perform the Breusch-Pagan test
bptest_result <- bptest(linear_model)

# Print the result of the Breusch-Pagan test
print(bptest_result)
## 
##  studentized Breusch-Pagan test
## 
## data:  linear_model
## BP = 1.5898, df = 1, p-value = 0.2074
# Interpret the result
if (bptest_result$p.value < 0.05) {
    cat("There is significant evidence of heteroscedasticity (p-value < 0.05)\n")
} else {
    cat("There is no significant evidence of heteroscedasticity (p-value >= 0.05)\n")
}
## There is no significant evidence of heteroscedasticity (p-value >= 0.05)

With a p-value > 0.05, we can conclude that there is no significant heterodascity and I can confidently trust the results of the linear regression model. The variance is not varying in any way that can impact the reliability of my regression model. There isn’t a large difference in how much the number of deaths fluctuates in any given year or season.

Forecast Model

A forecasted model will allow us to predict whether these patterns that we’ve established will persist in the future with the following code:

This code converts the UKDriverDeaths time series into a data frame compatible with Prophet. The “ds” column represents dates, while the “y” column contains the number of driver deaths. Prophet requires this specific format to perform forecasting.

UKDriverDeaths.df = data.frame(
    ds=zoo::as.yearmon(time(UKDriverDeaths)),
    y=UKDriverDeaths)

Building the Forecasting model using Prophet.

UKDriver_model = prophet::prophet(UKDriverDeaths.df)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
UKDriver_forecast = prophet::make_future_dataframe(UKDriver_model, periods=24, freq="quarter")

UKDriver_predict = predict(UKDriver_model, UKDriver_forecast)

plot(UKDriver_model, UKDriver_predict, main = "Forecast of UK Driver Deaths",
     xlab = "Year", ylab = "Number of Deaths")

It’s worth mentioning that the prophet model dismissed any presence of daily or weekly seasonality but didn’t deny any evidence of monthly seasonality.

This generates a future data frame with 24 additional periods (quarters) for forecasting. The freq=“quarter” argument specifies quarterly intervals, which aligns with the dataset.

This plot visulises the number of forecasted number of UK drive deaths from approximately 1969 to 1990 based on historical data from January 1969 to December 1984.

The plots show the actual data plots (the black dots), the forecasted values (the blue line), and the uncertainty intervals (shaded blue area)

General Fit: The overall rend and seasonal patterns that were previosuly discussed have been captured reasonably well.The blue line seems to be a suitbale fit for the data, represented by the black dots.

Downward Trend: The forecast continues the downward trend that was highlighted in the linear regression model. This solidifies that factors contributing to the decline in previous years are likely to contribute in the future.

Seasonsality: Despite the data in the forecasted plot being plotted yearly, upon zooming into the graph, you can still identify the individual seasonal patterns throughout these years.

This aligns with the seasonal pattern obsereved in the box plot analysis.

Uncertainty intervals: The uncertainty intervals (shaded blue area) widen as the forecast tends further into the future. This indicates the model is less confident in it’s predictions or more distant time periods, which is expected due the general uncertainty in forecasting.

Underestimation: It appears that the forecast may be underestimating the data in later years (1980s). The blue line seems to be consistently below some of the highest data points. This could be due to the model not fully capturing the magnitude of he seasonal fluctuations or the influence of specific events.

Bubble Graph

Higher deaths in early years: The largest bubbles (indicating higher fatalities) are mostly in the earlier years (1970s), with colours closer to light shades in the colour scale. This suggests that deaths were significantly higher in the early years aand like reduced over time. This is something we saw in the original raw lot that showcased the general trend.

Steady decline in deaths over time: As the colour becomes darker towrds 1984, the bubbles tend to shrink, indicating a reduction in fatalities. We pondered previously on the effect that improved safety laws had on driving fatalities. This could also be prevelant here. Once again, this downward trend is confirmed visually by this bubble chart as expected.

Conclusion

The analysis of UK driver deaths revealed seasonal patterns with peaks and troughs recurring annually. This seasonality was effectively captured by the prophet model, demostrating it’s ability to handle time series data with periodic fluctuations.

The overall downward trend in fatalities aligns with improvements in road safety measures and vehicle technology over time.

Using prophet, a plot was forecasted for future periods, (24 quarters). Our forecasting predicitions that this downward trend is likely to persist in the future and the prophet model’s flexibility allows us to account for quarterly seasonality providing a realisitng view of future trends.

Impact of external factors: Historical events like the oil crises in 1973 and 1979 may have influenced driving behaviour and fatallities. While the model doesn’t explicitly account for external factors, the observed dips during these periods lign with expectations.

By leveraging Prophet’s features (e.g., make_future_dataframe and predict), we were able to generate accurate forecasts and gain deeper insights into the dataset.